Objectives: The main objective of this work is to collect data from Uber's Open API for a week requesting information about rides from a unique starting point and finalizing in a one point of all neighborhoods of Natal, capital of Rio Grande do Norte - Brazil.
Group components:
The Open Uber API (https://developer.uber.com/) has a limited number of requests per hour and his limited to 2000 requests. The citie of Natal has 36 neighborhoods so according to this we have to make 36 requests per set of neighborhoods, so we could do
$$ {2000}/{36} = 55.55$$55 set of requests but we limited to only 1440 requests because we configured the data collector to execute this set of requests for every 90 seconds.
The data started to be collected from 29/11/2017 until 08/11/2017 in every 90 seconds. Generating about 34Mb of data and 235779 requests. With this configuration of the data collector if we we could be collect about: $$ 10 days * 24 hours * 1440 requests = 345600 lines$$ or $235779/345600 = 68.22\%$ of data collecting efficiency. Unfortunatelly the computer collecting the data was my laptop.
How do we organized everything? We created a simple python collector that utilizes the Uber rides library. To install simply do
!pip install uber-rides
after that all you have to do is to create an user and configure a new application in Uber's development website. More information abour the Uber Rides could be found here: https://developer.uber.com/docs/riders/ride-requests/tutorials/api/python
Importing the Uber Rides
from uber_rides.session import Session
from uber_rides.client import UberRidesClient
After this we collected some data to know how the API works and persisted the results in a comma separetad file (CSV). This CSV is loaded in this kernel and we start to play! :] You could check the imagem below
Below we do have the code that collected all the data.
In [18]:
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter
import IPython
with open('../consumer.py') as f:
code = f.read()
formatter = HtmlFormatter()
IPython.display.HTML('<style type="text/css">{}</style>{}'.format(
formatter.get_style_defs('.highlight'),
highlight(code, PythonLexer(), formatter)))
Out[18]:
In [2]:
#Jupyter Magic word to inline matplotlib plots
%matplotlib inline
#System libraries
import os
import sys
#Basic libraries for data analysis
import numpy as np
from numpy import random
import pandas as pd
#Choropleth necessary libraries
##GeoJson data
import json
##Necessary to create shapes in folium
from shapely.geometry import Polygon
from shapely.geometry import Point
##Choropleth itself
import folium
##Colormap
from branca.colormap import linear
#Plot
import matplotlib
import matplotlib.pyplot as plt
Importing data from Natal's neighborhoods in GeoJson format
In [4]:
# import geojson file about natal neighborhood
natal_neigh = os.path.join('geojson', 'natal.geojson')
# load the data and use 'UTF-8'encoding
geo_json_natal = json.load(open(natal_neigh,encoding='UTF-8'))
In [5]:
neighborhood = []
# list all neighborhoods
for neigh in geo_json_natal['features']:
neighborhood.append(neigh['properties']['name'])
Loading uber collected data. The dava was stored in a csv format
In [3]:
uberJsonRequests = pd.read_csv('../uberData/uberRidesRequests.csv')
In [6]:
uberJsonRequests.head(3)
Out[6]:
The data returned from Uber API there is no mean estimative so we created a mean estimation based in arithmetic median $$ Estimate_{Mean} = \dfrac{Estimate_{High}+Estimate_{Low}}{2} $$
In [7]:
uberJsonRequests['mean_estimate'] = (uberJsonRequests['low_estimate']+uberJsonRequests['high_estimate'])/2
Another information derived from timestamp feature is the period of the day that was divided in morning, afternoon and night. The aim of this data is to see how the prices of the rides are related with the period of time.
In [9]:
uberJsonRequests['periodOfDay'].unique()
Out[9]:
In [11]:
uberJsonRequests['timestamp'] = pd.to_datetime(uberJsonRequests['timestamp'])
According to wikipedia (https://pt.wikipedia.org/wiki/Lista_de_bairros_de_Natal_(Rio_Grande_do_Norte)) Natal is divided in four administrative regions: South, East, West and North.
In [14]:
def label_region (row):
if row['neighborhood'] == 'Alecrim' or row['neighborhood'] == 'Areia Preta' or row['neighborhood'] == 'Barro Vermelho' or row['neighborhood'] == 'Cidade Alta' or row['neighborhood'] == 'Lagoa Seca' or row['neighborhood'] == 'Mãe Luiza' or row['neighborhood'] == 'Petrópolis' or row['neighborhood'] == 'Praia do Meio' or row['neighborhood'] == 'Ribeira' or row['neighborhood'] == 'Rocas' or row['neighborhood'] == 'Santos Reis' or row['neighborhood'] == 'Tirol':
return 'East'
if row['neighborhood'] =='Igapó' or row['neighborhood']=='Lagoa Azul' or row['neighborhood']=='Nossa Senhora da Apresentação' or row['neighborhood']=='Pajuçara' or row['neighborhood']=='Potengi' or row['neighborhood']=='Redinha' or row['neighborhood']=='Salinas':
return 'North'
if row['neighborhood'] =='Candelária' or row['neighborhood']=='Capim Macio' or row['neighborhood']=='Lagoa Nova' or row['neighborhood']=='Neópolis' or row['neighborhood']=='Nova Descoberta' or row['neighborhood']=='Pitimbu' or row['neighborhood']=='Ponta Negra':
return 'South'
if row['neighborhood'] =='Bom Pastor' or row['neighborhood']=='Cidade da Esperança' or row['neighborhood']=='Cidade Nova' or row['neighborhood']=='Dix-Sept Rosado' or row['neighborhood']=='Felipe Camarão' or row['neighborhood']=='Guarapes' or row['neighborhood']=='Nordeste' or row['neighborhood']=='Nossa Senhora de Nazaré' or row['neighborhood']=='Planalto' or row['neighborhood']=='Quintas':
return 'West'
return ''
In [15]:
uberJsonRequests['region'] = uberJsonRequests.apply(label_region, axis=1)
In [16]:
uberJsonRequests
Out[16]:
Now we can comment about every feature of the dataset.
In [27]:
uberJsonRequests.info()
In [30]:
uberProducts = uberJsonRequests.pivot_table(index=['product_id', 'neighborhood'], values='mean_estimate', aggfunc=np.mean)
uberProducts
Out[30]:
In [31]:
uberPeriodOfDay = uberJsonRequests.pivot_table(index=['periodOfDay', 'neighborhood'], values='mean_estimate', aggfunc=np.mean)
uberPeriodOfDay
Out[31]:
Finally we show the plot of all the data collected showing the mean_estimate values by timestamp. In this plot is clear in which days the collector did not received data. Another thing that we noticed is how expensive a trip could be.
In [58]:
plt.figure(figsize=(20,10))
plt.plot(uberJsonRequests['timestamp'],uberJsonRequests['mean_estimate'])
Out[58]:
In [60]:
plt.plot(uberJsonRequests[uberJsonRequests['high_estimate']>100]['timestamp'],uberJsonRequests[uberJsonRequests['high_estimate']>100]['mean_estimate'])
Out[60]:
Trips
In [68]:
uberJsonRequests[uberJsonRequests['high_estimate']>200]
Out[68]:
After adjusting our data we now are prepared to prepare for our Choropleth.
As we noticed we could make a set of values baxed on Uber products, Uber X and Uber Select, and another data based on the period of time, morning/afternoon/evening. For each Uber product or period of time we could use a colormap to see how the trips are distributed in terms of price.
In [10]:
colormap = linear.GnBu.scale(uberJsonRequests['low_estimate'].mean(), uberJsonRequests['high_estimate'].mean())
colormap
Out[10]:
In [39]:
colorscaleUberX = linear.BuPu.scale(uberProducts['65cb1829-9761-40f8-acc6-92d700fe2924'].min(), uberProducts['65cb1829-9761-40f8-acc6-92d700fe2924'].max())
colorscaleUberX
Out[39]:
In [40]:
colorscaleUberSelect = linear.BuPu.scale(uberProducts['bf8f99ca-f5f2-40d4-8ffc-52f1e2b17138'].min(), uberProducts['bf8f99ca-f5f2-40d4-8ffc-52f1e2b17138'].max())
colorscaleUberSelect
Out[40]:
In [50]:
colorscaleMorning = linear.BuGn.scale(uberPeriodOfDay['morning'].min(), uberPeriodOfDay['morning'].max())
colorscaleMorning
Out[50]:
In [51]:
colorscaleAfternoon = linear.OrRd.scale(uberPeriodOfDay['afternoon'].min(), uberPeriodOfDay['afternoon'].max())
colorscaleAfternoon
Out[51]:
In [52]:
colorscaleEvening = linear.PuBu.scale(uberPeriodOfDay['evening'].min(), uberPeriodOfDay['evening'].max())
colorscaleEvening
Out[52]:
Observe the different bounds of the period of day. The high estimate in the period of evening (that goes from 6PM to 6AM) is about
more expensive than the period of morning (6AM to noon).
$$ \big({\dfrac{Max_{Morning}}{Max_{Evening}}}\big)*100 $$
In [46]:
# Create a map object
uberMap = folium.Map(
location=[-5.826592, -35.212558],
zoom_start=12,
tiles='cartodbpositron',
height=800
)
# UberX
folium.GeoJson(
geo_json_natal,
name='UberX Price Estimates',
style_function=lambda feature: {
'fillColor': colorscaleUberX(uberProducts['65cb1829-9761-40f8-acc6-92d700fe2924'][feature['properties']['name']]),
'color': 'blue',
'weight': 0.5,
'dashArray': '5, 5',
'fillOpacity': 0.8,
'name': feature['properties']['name']
}
).add_to(uberMap)
# UberSelect
folium.GeoJson(
geo_json_natal,
name='UberSelect Price Estimates',
style_function=lambda feature: {
'fillColor': colorscaleUberSelect(uberProducts['bf8f99ca-f5f2-40d4-8ffc-52f1e2b17138'][feature['properties']['name']]),
'color': 'blue',
'weight': 0.5,
'dashArray': '5, 5',
'fillOpacity': 0.8,
'name': feature['properties']['name']
}
).add_to(uberMap)
colorscaleUberX.caption = 'UberX Price Estimatives'
colorscaleUberX.add_to(uberMap)
colorscaleUberSelect.caption = 'UberSelect Price Estimatives'
colorscaleUberSelect.add_to(uberMap)
folium.LayerControl().add_to(uberMap)
uberMap
Out[46]:
In [54]:
# Create a map object
uberMapPeriod = folium.Map(
location=[-5.826592, -35.212558],
zoom_start=12,
tiles='cartodbpositron'
)
# Morning
folium.GeoJson(
geo_json_natal,
name='Morning Price Estimates',
style_function=lambda feature: {
'fillColor': colorscaleMorning(uberPeriodOfDay['morning'][feature['properties']['name']]),
'color': 'green',
'weight': 0.5,
'dashArray': '1,1',
'fillOpacity': 0.8,
'name': feature['properties']['name']
}
).add_to(uberMapPeriod)
# Afternoon
folium.GeoJson(
geo_json_natal,
name='Afternoon Price Estimates',
style_function=lambda feature: {
'fillColor': colorscaleAfternoon(uberPeriodOfDay['afternoon'][feature['properties']['name']]),
'color': 'orange',
'weight': 0.5,
'dashArray': '1, 1',
'fillOpacity': 0.8,
'name': feature['properties']['name']
}
).add_to(uberMapPeriod)
# Evening
folium.GeoJson(
geo_json_natal,
name='Afternoon Price Estimates',
style_function=lambda feature: {
'fillColor': colorscaleEvening(uberPeriodOfDay['evening'][feature['properties']['name']]),
'color': 'black',
'weight': 0.5,
'dashArray': '1, 1',
'fillOpacity': 0.8,
'name': feature['properties']['name']
}
).add_to(uberMapPeriod)
colorscaleMorning.caption = 'Morning Price Estimatives'
colorscaleMorning.add_to(uberMapPeriod)
colorscaleAfternoon.caption = 'Afternoon Price Estimatives'
colorscaleAfternoon.add_to(uberMapPeriod)
colorscaleEvening.caption = 'Evening Price Estimatives'
colorscaleEvening.add_to(uberMapPeriod)
folium.LayerControl().add_to(uberMapPeriod)
uberMapPeriod
Out[54]:
In [70]:
# Create a map object
uberMapGeneral = folium.Map(
location=[-5.826592, -35.212558],
zoom_start=12,
tiles='cartodbpositron',
height=800
)
folium.GeoJson(
geo_json_natal,
name='UberX Price Estimates',
style_function=lambda feature: {
'fillColor': colormap(uberJsonRequests['mean_estimate'][feature['properties']['name']]),
'color': 'blue',
'weight': 0.5,
'dashArray': '5, 5',
'fillOpacity': 0.8,
'name': feature['properties']['name']
}
).add_to(uberMapGeneral)
colormap.caption = 'Price Estimatives'
colormap.add_to(uberMapGeneral)
folium.LayerControl().add_to(uberMapGeneral)
#folium.Marker([-5.796158, -35.216566], popup='Alecrim, Valor médio: 21.177857').add_to(uberMapGeneral)
#folium.Marker([-5.789477, -35.188454], popup='Areia Preta, Valor médio: 23.977857').add_to(uberMapGeneral)
#folium.Marker([-5.800741, -35.211056], popup='Barro Vermelho, Valor médio: 18.666429').add_to(uberMapGeneral)
#folium.Marker([-5.813764, -35.240721], popup='Bom Pastor, Valor médio: 19.228393').add_to(uberMapGeneral)
#folium.Marker([-5.841737, -35.210463], popup='Candelária, Valor médio: 11.591687').add_to(uberMapGeneral)
#folium.Marker([-5.862753, -35.195597], popup='Capim Macio, Valor médio: 16.732264').add_to(uberMapGeneral)
#folium.Marker([-5.785291, -35.206464], popup='Cidade Alta, Valor médio: 23.384464').add_to(uberMapGeneral)
#folium.Marker([-5.825376, -35.242751], popup='Cidade da Esperança, Valor médio: 23.109185').add_to(uberMapGeneral)
#folium.Marker([-5.834856, -35.242523], popup='Cidade Nova, Valor médio: 20.746426').add_to(uberMapGeneral)
#folium.Marker([-5.812036, -35.223915], popup='Dix-Sept Rosado, Valor médio: 16.956429').add_to(uberMapGeneral)
#folium.Marker([-5.824374, -35.250070], popup='Felipe Camarão, Valor médio: 23.978914').add_to(uberMapGeneral)
#folium.Marker([-5.841580, -35.274691], popup='Guarapes, Valor médio: 36.411187').add_to(uberMapGeneral)
#folium.Marker([-5.769009, -35.254755], popup='Igapó, Valor médio: 33.120807').add_to(uberMapGeneral)
#folium.Marker([-5.734177, -35.253802], popup='Lagoa Azul, Valor médio: 45.140514').add_to(uberMapGeneral)
#folium.Marker([-5.819743, -35.212920], popup='Lagoa Nova, Valor médio: 12.775357').add_to(uberMapGeneral)
#folium.Marker([-5.809191, -35.209374], popup='Lagoa Seca, Valor médio: 15.975205').add_to(uberMapGeneral)
#folium.Marker([-5.794771, -35.188619], popup='Mãe Luiza, Valor médio: 23.746786').add_to(uberMapGeneral)
#folium.Marker([-5.870513, -35.208176], popup='Neópolis, Valor médio: 19.888275').add_to(uberMapGeneral)
#folium.Marker([-5.796215, -35.245141], popup='Nordeste, Valor médio: 27.215000').add_to(uberMapGeneral)
#folium.Marker([-5.763654, -35.282543], popup='Nossa Senhora da Apresentação, Valor médio: 38.594575').add_to(uberMapGeneral)
#folium.Marker([-5.815939, -35.229249], popup='Nossa Senhora de Nazaré, Valor médio: 16.518571').add_to(uberMapGeneral)
#folium.Marker([-5.824830, -35.200026], popup='Nova Descoberta, Valor médio: 9.996964').add_to(uberMapGeneral)
#folium.Marker([-5.751781, -35.234665], popup='Pajuçara, Valor médio: 39.676061').add_to(uberMapGeneral)
#folium.Marker([-5.782001, -35.195196], popup='Petrópolis, Valor médio: 23.243929').add_to(uberMapGeneral)
#folium.Marker([-5.876271, -35.224500], popup='Pitimbu, Valor médio: 21.161262').add_to(uberMapGeneral)
#folium.Marker([-5.858102, -35.251586], popup='Planalto, Valor médio: 28.573770').add_to(uberMapGeneral)
#folium.Marker([-5.877522, -35.176073], popup='Ponta Negra, Valor médio: 21.856557').add_to(uberMapGeneral)
#folium.Marker([-5.758634, -35.247010], popup='Potengi, Valor médio: 37.519636').add_to(uberMapGeneral)
#folium.Marker([-5.779198, -35.197163], popup='Praia do Meio, Valor médio: 23.750624').add_to(uberMapGeneral)
#folium.Marker([-5.797290, -35.226006], popup='Quintas, Valor médio: 22.788750').add_to(uberMapGeneral)
#folium.Marker([-5.742772, -35.205806], popup='Redinha, Valor médio: 35.059579').add_to(uberMapGeneral)
#folium.Marker([-5.774943, -35.205578], popup='Ribeira, Valor médio: 26.279643').add_to(uberMapGeneral)
#folium.Marker([-5.771832, -35.203091], popup='Rocas, Valor médio: 26.428826').add_to(uberMapGeneral)
#folium.Marker([-5.763117, -35.247973], popup='Salinas, Valor médio: 37.037460').add_to(uberMapGeneral)
#folium.Marker([-5.763226, -35.196786], popup='Santos Reis, Valor médio: 27.564395').add_to(uberMapGeneral)
#folium.Marker([-5.791699, -35.197358], popup='Tirol, Valor médio: 22.539286').add_to(uberMapGeneral)
uberMapGeneral
Out[70]:
In [ ]:
In [ ]:
m2